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In this paper, based on a weighted projection of the user-object bipartite network, we 
study the effects of user tastes on the mass-diffusion-based personalized recommendation 
algorithm, where a user's tastes or interests are defined by the average degree of the 
objects he has collected. We argue that the initial recommendation power located on 
the objects should be determined by both of their degree and the users' tastes. By 
introducing a tunable parameter, the user taste effects on the configuration of initial 
recommendation power distribution are investigated. The numerical results indicate that 
the presented algorithm could improve the accuracy, measured by the average ranking 
score, more importantly, we find that when the data is sparse, the algorithm should give 
more recommendation power to the objects whose degrees are close to the users' tastes, 
while when the data becomes dense, it should assign more power on the objects whose 
degrees are significantly different from user's tastes. 
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1. Introduction 

With the rapid growth of the Internet and the World- Wide- Web, a huge amount of 
data and resource confront people with an information overload^. There are thou- 
sands of movies, millions of books, and billions of web pages on the web sites, and 
the amount of information is increasing more quickly than our personal processing 
abilities. This brings about massive amount of accessible information, which may 
result in a dilemma problem. It's hard for us to effectively filter out the pieces of 

I 



14, 2009 15:17 WSPC/INSTRUCTION FILE ijmpc 



2 Jian-Guo Liu et. al. 

information that are most app ropriate for us. A landmark for information filtering 
is the use of search engine ^il, by which user could find the relevant web pages by 
putting certain keywords. However, the search engine only returns the same results 
regardless of the users' tastes and interests. 

Thus far, the most promising way to efficiently filter out the information overload 
is to provide personalized recommendations, which attempts to find out objects 
likely to be interesting to the target users by extracting the hidden information 
from the users' historical selections or collections. Motivated by its significance 
for economy and society, the design of efficient recommendation algorithms has 
become a common focus for computer science, mathematics, marketing practices, 
management science and physics. Various ki nds of alg orithms have been prop osed, 
such collaborative filterin g (CF) appr oaches I 4 | 5 | 6 | 7 | 8 f co ntent-b ased analyses EEIl 
network-based algorithm | ll | 12 | 13 jl 4J ^yfa y algorithms 1 16 | 1T |^ an( j g0 on p or a 
review of current progress, see Refs. I 18 | 19 | anc j references therein. 

Very recently, some physical dynamics, including mass diffusion (MD)G3Q31and 
heat conduction (HC) have found their applications in personalized recommen- 
dations. These algorithms hav e been demo nstrated to be of both high accuracy and 
low computational complexity I H | 12 | 13 | 14 | g mce md anc j jjq algorithms could be 
implemented based on the user-object bipartite network, it's also called network- 
based algorithm. The network-based algorithm supposes that the objects one user 
has collected have the power to recommend new objects to the target user, which 
is coincidence with the definition reachability H^-l In this paper, we introduce an 
improved MD algorithm with user-taste-dependent initial configuration. Compared 
with the uniform initial configuration, the prediction accuracy can be enhanced by 
using the user-taste-dependent configuration. More significantly, besides the pre- 
diction accuracy, we find that the data sparsity is an important factor affecting 



<a) _ (b) _ <c) 




Fig. 1. Illustration of the network-based algorithm. The network-based algorithm could be applied 
in the following way. (a) The objects collected by the target user are activated; (b) The heat is 
diffused from the activated objects to the users who have collected them; (c) Then it's diffused 
back from the users to the objects. 
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the algorithm performance. When the sparsity of the user-object bipartite network 
is small, in other words, there are few edges between the users and objects, the 
algorithm should pay more attention on the users' habits and tastes, while when 
the number of edges in the bipartite network is large, the algorithm should give 
more recommendation power on the objects whose degrees significantly different 
with users' habits. Numerical simulations show that the improved algorithm has 
higher accuracy and can provide more diverse and less popular recommendations. 

2. Mass-diffusion-based personal recommendation 

In a recommender system, each user has voted or collected some objects, the system 

could be described by a bipartite network, in which there are two kind of nodes 

users and objects, the users' historical collection or selection behaviors could be 

well demonstrated by the edges connecting the users and objects. Formally, denote 

the object set as O — {oi, o%, ■ ■ ■ , o m } and the user set as U = {ui, u%, • • • , u n }, 

the system can be fully described by a bipartite network with m + n nodes, where 

there is an edge between a user and object if and only if this object is collected 

by the user. The bipartite network could be described by an adjacent matrix A = 

{ciij} £ R m n , where ay = 1 if Oj is collected by Uj, and ay = otherwise. In 

MD algorithm, an object-object similarity network W = {w a p} m>m is constructed 

firstly, where each node represents an object and two objects are connected if they 

have been collected simultaneously by at least one user. Then, to a target user, an 

amount of recommendation power is set on each object he has collected, and the 

proportion of the resource w a p would like to distribute from op to o a . In MD, a 

reasonable assumption is that the objects that users have collected are what they 

like, and the objects a target user has collected would be regarded as the initial 

mass source, then the activated objects redistribute the mass to the users who 

have collected them before, with users receiving a level of mass equal to the mean 

amount possessed by their neighboring objects, and objects then receiving back the 

mean of their neighboring users mass levels. Due to the sparsity of real data sets, 

these "physical" descriptions of the algorithm turn out to be more computationally 

efficient in practice than constructing and using the object similarity matrix W, 

and MD algorithm could be implemented in three steps on the user-object bipartite 

network, which is shown in FigQJa-c). 

Lind et. al. presented a cycl e measurement to investigate the clustering prop- 
I^Afvl I 

erty in bipartite network 11 \ According to the algorithm description and the 
cycle definition, the object similarity of the mass-diffusion-based algorithm can be 
expressed as 



where k(op) — J27=i a & i an< ^ k(ui) = J2iLi a u denote the degrees of object op and 
user ui, respectively. For a target user, in the simplest case, the initial resource 
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Fig. 2. Average ranking score (r) vs. (3 when p = 10, 20, 30 and 40. All the data points are 
averaged over ten independent runs with different data-set divisions. 

vector f = {/i, / 2 , ■ • ■ , f m } T can be set as 

fj = aji- (2) 

In other words, only the objects user Ui has collected are set unit resource. After 
the mass-diffusion-process demonstrated in Fig.l, the final resource vector is 

i = Wf. (3) 

Sorting the vector f in descending order according to value of fj, the objects ob- 
tained highest values are recommended to the target user. 



3. Improved algorithm by considering the user taste effects 

In the standard MD algorithm, for any user m, all of the collected objects are as- 
signed the same recommendation power. Although it already has a good algorithmic 
accuracy, this uniform configuration may be oversimplified, and didn't consider the 
effects of user tastes. In this paper, the user taste is defined by the average ob- 
ject degree he has collected. The objects whose degrees are close to the user taste 
should be assigned more recommendation power. We also notice that most of the 
user tastes are less than 100, while the degrees of the popular objects are close to 
300. If the recommendation power is assigned according to the distance between the 
object degree and the user taste, it will give more power on the popular objects and 
weaken the unpopular object effects. In order to balance the objects whose degrees 
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Fig. 3. The optimal f3 op t and the corresponding average ranking score (r) op t vs. the sparsity of 
the training set. All the data points are averaged over ten independent runs with different data-set 
divisions. 

are larger or less than the user tastes, we present a more complicated distribution 
of initial resource according to the following way. 

fct &ailai: (4) 

where I a i is defined as follows 

r {k{o a )/k{ui)f k( 0a ) > k{ Ui ) 

lai = \ _ _ (5) 

{ {k{ui)/k{o a )f k(o a ) < k{ui) 

where denote the average degree of user Uj's collected objects, and (3 is a 

tunable parameter. Compared with the uniform case, (3 = 0, a positive (3 strengthens 
the influence of the objects whose degrees are larger or less than k(ui), while a 
negative (3 strengthen the influence of the objects whose degrees are close to k{m). 



4. Numerical results 

A benchmark dataset, namely MovieLens, is used to test the improved algorithm. 
The MovieLens data is a randomly-selected subset of the huge data, which consists 
of 1682 movies (objects) and 943 users. The users vote movies by discrete ratings 
from one to five. We applied a coarse-graining method: A movie is set to be collected 
by a user only if the giving rating is larger than 2. The original data contains 10 5 
ratings, 85.25% of which are > 3, that is, the user-object (user-movie) bipartite 
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network after the coarse gaining contains 85250 edges. We randomly divide this 
data set into two parts: one is the training set, treated as known information, and 
the other is the probe, whose information is not allowed to be used for prediction. 
We use a parameter p to control the data density, that is, p% of the ratings are put 
into the probe set, and the remains compose the training set. 

A good recommender method should rank preferable objects to math the users' 
tastes. Therefore, the collected objects in the probe set should be set at the top 
level of the recommendation lists. The average ranking score is adopted to measure 
the accuracy. It could be defined as follows. For a target user Ui , if the entry Ui-Oj 
is in the probe set, we measure the position of Oj in the ordered list. For example, if 
there are L, = 10 uncollected objects for itj, and Oj is the 2nd one from the top, we 
say the position of Oj is 2/10, denoted by r\j = 0.2. A good algorithm is expected 
to give high recommendations to them, thus leading to small r^ . Therefore, the 
mean value of the position (r) can be used to evaluate the algorithmic accuracy: 
the smaller the average ranking score, the higher the algorithmic accuracy, and vice 
verse. The average degree of all recommended objects, (fc), and the mean value 
of Hamming distance, S, are taken into account to evaluate the popularity and 
diversity. The smaller average degree, corresponding to the unpopular objects, are 
preferred since those small-degree objects are hard to be found by users themselves. 
The diversity can be quantified by the average Hamming distance, S = (Hij), where 
Hij = 1 — Qij/L, L is the length of recommendation list and Qij is the overlapped 
number of objects in m and Uj's recommendation lists. The largest S = 1 indicates 
the recommendations to all of the users are totally different, while the smallest 
5 = means all of recommendations are exactly same. 

Implementing the improved algorithm on the MovieLens data, the accuracy, 
popularity and diversity are investigated. Figure [5] reports the algorithmic accuracy 
as a function of j3 to different p, from which one can find that the curves obtained 
by the improved algorithm has clear minimums, which strongly support the above 
discussion. Compared with the routine case ((3 = 0), the average ranking score can 
be reduced 5.6% at the optimal case when p = 10. Numerical results on different 
percentage of probe sets show that the optimal parameter /3 opt decreases with the 
increase of p. Figure [3] reports the relation between the optimal /3 pt> the corre- 
sponding average ranking scores (r) op t and the sparsity of the training sets. One 
can see from Fig [3] that the optimal (r) op t is negatively correlated with the data 
sparsity, where the sparsity is defined as ^ , and E is the number of edges in 
the user-object bipartite network, more interestingly, the optimal parameter /3 op t 
is positively correlated with the sparsity. The reason may lie in the fact that when 
the users have not collected too much objects, their tastes are easy to be distin- 
guished, therefore, the objects whose degrees are close to k(ui) should be assigned 
more recommendation power. As the number of users' collected objects increases, 
users' tastes become diversity, therefore, it's hard to catch the users interested and 
habits. Under these circumstances, the users are more interesting to the objects 
different from his historical collects which could bring him/her fresh information. 
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Fig. 4. When the recommendation list L = 10, (Jfc) and 5 vs. f3 of p = 10, 20, 30 and 40. All the 
data points are averaged over ten independent runs with different data-set divisions. 

Besides accuracy, the popularity and diversity are also investigated. Figure0]reports 
the average degree and diversity of all recommended movies as a function of /? to 
different p when the recommendation lists L = 10, from which one can find that 
although the average object degrees scarcely change, the diversity is increased at 
the optimal /3 pt- 

5. Conclusion and Discussion 

In this paper, the effects of user tastes on MD recommendation algorithm are in- 
vestigated, where the user tastes are defined by the average object degree he/her 
has collected. By introducing a free parameter /3, an improved algorithm by regu- 
lating the initial configuration of resource is presented. Numerical results indicate 
that when the data set is sparse, it's easy to distinguish the users' tastes and the 
objects whose degrees are close to the users' tastes should be assigned more recom- 
mendation power, while as the data set becoming dense, the objects whose degree 
far from the users' tastes should be emphasized. Besides the average ranking score, 
the popularity and personalization of recommended objects are also taken into ac- 
count. The results show that the improved algorithm outperforms the standard MD 
algorithm in both of accuracy and personalization. 

In the improved algorithm, we only give a kind of user taste definition, however, 
there are several other ways to define the users' tastes, such as time-dependent 
behavior, variance of the user collected object degrees, and so on. We believe MD 
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algorithm could be further improved by catching the users' current tastes. 

Instead of calculating all the elements in W, one can implement the current algo- 
rithm by directly diffusing the resource of each user. Ignoring the degree-degree cor- 
relation in user-object relations, the algorithmic complexity is O(m{k u )(k )), where 
(k u ) and (k a ) denote the average degrees of users and objects. Theoretical physics 
provides us some beautiful and powerful tools in dealing with this long-standing 
challenge in modern information science: how to do a personal recommendation. 
The presented algorithm also could be used t o find the relevant reviewers for the 
scientific papers or funding applications E2E5| anr \ the link prediction in social and 
biological networks^!. We believe the current work can enlighten readers in this 
promising direction. 
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China (No. 2006CB705500), the National Natural Science Foundation of China 
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